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Abstract 

The Yule-Simpson paradox notes that an association between random variables can be 
reversed when averaged over a background variable. Cox and Wermuth introduced a new 
concept of distribution dependence between two random variables X and Y, and developed 
two dependence conditions, each of which guarantees that reversal cannot occur. Ma, Xie and 
Geng studied the collapsibility of distribution dependence over a backround variable W, un- 
der a rather strong homogeneity condition. Collapsibility ensures the association remains the 
same for conditional and marginal models, so that Yule-Simpson reversal cannot occur. In this 
paper, we investigate a more general notion called A-collapsibility. The conditions of Cox and 
Wermuth imply A-collapsibility, without assuming homogeneity. In fact, we show that, when 
W is a binary variable, collapsibility is equivalent to A-collapsibility plus homogeneity, and 
A-collapsibility is equivalent to the conditions of Cox and Wermuth. Recently, Cox extended 
Cochran's result on regression coefficients of conditional and marginal models, to quantile re- 
gression coefficients. The conditions of Cox and Wermuth are also sufficient for A-collapsibility 
of quantile regression coefficients. Under a conditional completeness assumption, they are also 
necessary. 

Keywords. Distribution dependence, collapsibility, Yule-Simpson paradox, conditional inde- 
pendence, A-collapsibility, contingency table , quantile regression coefficient. 



1 Introduction 

There are several ways to interpret the association between a response and an explanatory vari- 
able. The measure of association may be measured by odds ratio or relative risk or interaction 
parameters of the corresponding log-linear model (for categorical variables), regression coeffi- 
cient or distribution dependence (for continuous variables). The concept of collapsibility with 
respect to these parameters was well studied by Bishop (1971), Cox (2003), Cox and Wermuth 
(2003), Geng (1992), Ma et. al. (2006), Vellaisamy and Vijay (2007, 2008), Wermuth (1987, 



1989) and Whittemore (1978), among others. Cox and Wermuth (2003) defined distribution 
dependence as a measure of association between two variables, and discussed the reversal effect 
when a background variable (sometimes unobserved) is condensed. They obtained sufficient 
conditions for no reversal effect, that is, for the non-occurence of Yule and Simpson's paradox. 
Recently, Ma et. al. (2006) proved that the conditions of Cox and Wermuth (2003) are indeed 
necessary and sufficient for collapsibility of distribution dependence, under the assumption 
that distribution dependence is homogeneous over the background variable. Homogeneity is a 
rather strong assumption, and restricts the applicability of these conditions. 

The concept of A-collapsibility for random coefficient models was introduced and dis- 
cussed in Vellaisamy and Vijay (2008). In the same spirit, this paper considers (average) A- 
collapsibility of distribution dependence. A-collapsibility means simply that the conditional 
effect averages over the background variable to the corresponding marginal effect. The con- 
ditions of Cox and Wermuth (2003) are shown to be sufficient for A-collapsibility, and also 
necessary when W is a binary variable. A necessary condition for A-collapsibility in terms 
of conditional densities is also obtained. Recently, Cox extended Cochran's result on regres- 
sion coefficients of conditional and marginal models, to quantile regression coefficients. The 
conditions of Cox and Wermuth are also shown to be sufficient for A-collapsibility of quantile 
regression coefficients. Under a conditional completeness assumption, they are even necessary. 



2 Collapsibility of Distribution Dependence 

Let X and Y be two random variables. The dependence Y on X is called stochastically 
increasing if P(Y > y \ X = x) is increasing in x for all y. When Y is continuous, this is 
equivalent to saying that the conditional distribution function G(y \ x) decreasing in x, that 
is, 

for all y and x, with strict inequality in a region of positive probability. When X is discrete, 

the partial differentiation is replaced by differencing between adjacent levels of X. Assume 

now, for simplicity, that the variables Y, X, and W are continuous. Suppose also that Y given 

o F (v x iv ) 

X = x and W = w is stochastically increasing in x for all w, so that — < for all 

ox 

y, x and w. Then, 

F(y | x) = P(Y < y \ X = x) = J F(y \ x, w)f(w \ x) dw. 
On differentiating with respect to x, we have 



TfX±W, then f(w \ x) = f{w) and so (Cox (2003)) 

df(w | x] 



dx 

leading to, 



0, 



dF(y\x) _ fdF(y\x,wl fHdw ^ 



dx J dx 
Thus, when X _L W, we have from (|2.3p . 

dF(y\x,w) _^dF(y\x) 

< < 0, for all y, x and w. 

dx dx 

Thus, Y remains stochastically increasing in x after marginalization over the covariate W. Note 

dFiv x wi dFiv x] 

in general (see (j2.2|) ) it is possible that — - — < 0, for all y, x and w but > 

dx dx 
for some y and x, implying the reversal effect. That is, the dependence of Y and X is no longer 

stochastically increasing. This reversal effect is known as Yule-Simpson paradox (e.g., see Cox 

and Wermuth (2003)). 

Let Y be a response variable, X be an explanatory variable and W be a background variable. 
d F (y |x, uu^) 

The function — ■ is called a distribution dependence function. If the variable X is 

dx 

categorical with support S(X) = {1, ■ • • ,/}, then the distribution dependence function is 
defined as (see Cox (2003)). 

dF{V Q X x ' W) = ^F(y\i, w) = P(Y <y\i + l,w)-P(Y <y\i,w), (2-4) 
fort = 1,2,..- ,1-1. 

The following definitions are due to Ma et. al. (2006). 

Definition 2.1 The distribution dependence function is said to be homogeneous with respect 
to W if 

dF(y\x,w) dF(y\x,w') 



dx dx 

for all y, x and w ^ w' . 

Definition 2.2 The distribution dependence function is said to be collapsible over W if 

dF(y\x,w) dF(y\x) 



dx dx 

and uniformly collapsible if 



for all y, x and w, 



dF(y\x,W € A) _ dF{y\x) 



dx dx 

for all y, x and A in the support of W. When W is ordinal, the set A is of the form (i, i + 
1, - 
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Note that uniformly collapsible implies collapsible, and collapsible implies homogeneous. 

Homogeneity is commonly assumed for pooled estimation as in Mantel and Haenszel (1959). 

Ma et. al. (2006) showed that the distribution dependence function is uniformly collapsible iff 

u F\u \x u)} 

either: (a) Y _L A|V^; or (b) X _L W and — - is homogeneous in w. Cox and Wermuth 

ox 

(2003) note that either condition (a) or (b) is sufficient to ensure that no reversal effect can 
occur when marginalizing the background variable W. 



3 A-Collapsibility of Distribution Dependence 

A-collapsibility is a weaker criterion for non-reversal. It requires only that the conditional 
effect averages over the background variable to the corresponding marginal effect. In partic- 
ular, it does not assume homogeneity. In fact, collapsibility is equivalent to A-collapsibility 
plus homogeneity (see Theorem 3.3), when W is a binary variable. Homogeneity is a strong 
assumption (see Ma et. al. (2006), p. 129). Most of the models that are encountered in prac- 
tice are not homogeneous (e.g., see example 3.1). As an another example, consider a simple 
non-linear regression 

Y = m(X,W) + e, 
where m(x, w) = ct\x + ct2W + a^xw, and e ~ N(0, a 2 ). Then, 



dx \ a 

where <f> is the standard normal density, so that this example is not homogeneous over W. 



The above observations motivate our definition of (average) A-collapsibility of distribution 
dependence, similar to A-collapsibility of regression coefficients described in Vellaisamy and 
Vijay (2008). Indeed, this seems to be a natural definition of collapsibility for a large class of 
conditional distribution functions. 

dF(y\x vS] 

Definition 3.1 The distribution dependence function — - is (average) A- collapsible 

over W if 

f dF(y\x,W) \ dF{y\x) 
E w \x=x ( J = — — , for all y and x. (3.1) 



Note that the above definition is a natural extension of simple collapsibility of distribution 

dF(y\x, w) 

dependence. Indeed, when is homogeneous over W, A-collapsibility reduces to 

collapsibility. The next result shows that the conditions of Cox and Wermuth (2003) are 
sufficient for A-collapsibility. 
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Theorem 3.1 (a): Either of the conditions 

(i) Y JL W | X; or 

(ii) W 1 X 

dF(v\x 

are sufficient for the distribution dependence function — - to be A-collapsible over a 

ox 

discrete background variable W . 

(b): Conversely, ifW is binary, say w £ {1,2}, then the condition (i) or (ii) is also necessary. 
Next we provide an example that is A-collapsible, but neither collapsible nor homogeneous. 

Example 3.1 Consider the following 2x2x2 table. 



Y X 


W 

1 2 


1 

1 

2 


25 35 
75 60 


1 

2 

2 


35 15 
45 40 



Here, we have 

A x F{l\l, 1) = P(Y = 1\X = 2, W = 1) - P(Y = 1\X = 1, W = 1) = 0.208; and 

A X F(1|1, 2) = P(Y = 1\X = 2, W = 2) - P(Y = 1|X = 1, W = 2) = -0.1. 

That is, the distribution dependence is not homogeneous. Also, note from the marginal table 
of Y and X, 

Aa.F(l|l) = P(Y = l\X = 2)- P{Y = l\X = \) = 0.068 + A X F(1[1, w), 

so that the distribution dependence function is not collapsible over W. However, from the 
marginal table of X and W, 





W 

1 2 


1 

X 

2 


60 50 
120 100 



it can be seen that X _L W and 

E wlx=1 (A x F(l\l,W)) = ^2(A x F(l\l,w))f w \ x (w\x) 

w 

= A x F(l\l,l)f w (l) + A x F(l\l,2)f w (2) = .068 
= A XJ F(1|1). 
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Therefore, the distribution dependence function is A-collapsible with respect to the background 
variable W. 



Remark 3.1 Theorem 1 in Ma et. al. (2006) implies that Y _L W \ X is a sufficient condition 
for collapsibility. However, one can easily see that ifY _L W \ X, then 

F(y\x,w) = F(y\x) = F(y\x,w'),V w ^ w' . (3.2) 

Hence, 

dF(y\x,w) = dF{y\x) = dF(y\x,w') 
dx dx dx 

implying that the homogeneity condition is also satisfied. Thus, Y _L W \ X ==> homogeneity 

of distribution dependence. So, Theorem 1 in Ma et. al. (2006) pertains only to the class of 

distribution dependence functions that are homogeneous over W . 



Note also from (|2.2p that A-collapsibility holds if and only if 

f F(y\x,w) 9f ^ * X "> dw = for all (y,x). (3.4) 
J dx 

The following example shows that A-collapsibility can hold even when neither condition (i) 
nor condition (ii) of Theorem 3.1 hold. Hence these conditions are not necessary, unless the 
background variable W is binary. 

Example 3.2 Let Y, given X = x and W = w, follow uniform U(0, (x 2 + (w — x) 2 ) -1 ) so 
that 

F(y\x,w) = y(x 2 + (w - xf), < y < (x 2 + (w - x) 2 )^ 1 . (3.5) 

Assume also (W|X = x) ~ N(x, 1) so that 

d 

—f(w\x) = —(j) (w — x) = (w — x)(j>(w — x), (3.6) 
dx 



where 4>{z) denotes the density of N(0, 1) distribution. Hence, 

I— 

dx 



d 

F(y\x,w) — f(w\x)dx = y (x 2 + (w — x) 2 )(w — x)(f>(w — x)dw 



y[x 2 / (w — x)4>(w — x)dw + (w — x) 3 (f>(w — x)dw] 



= y[x 2 J t(p(t)dt + J F<f)(t)dt] 

= 0, for all (y,x) G S yx . (3.7) 

Thus, from ()3.4p . A-collapsibility over W holds, but neither condition (i) nor condition (ii) is 
satisfied. 



6 



The following result provides a necessary condition for A-collapsibility. It also shows that 
A- collapsibility of distribution dependence implies A-collapsibility of density dependence. 



Theorem 3.2 Suppose F(y\x,w) and F(y\x) admit continuous mixed partial derivatives (with 
respect to y and x). Then a necessary condition for A-collapsibility of the distribution depen- 
dence function over W is 

f df(y\x,W) ^ df(y\x) 



Remark 3.2 (i) Note in Example 3.2 

df(y\x,w] 



dx 



2x + 2(x — w); f(w\x) = 4>(w — x). 



Hence, 



Ew\ x ( df{y l X,W) ) = 2x + 2(x - E(W\x)) = 2x. 
ox 



Also, 

f(y\.r) I f(y\x,w)f(w\x)dw (3.9) 

(x 2 + (w- x) 2 )(j)(w - x)dw (3.10) 
= x 2 + l. (3.11) 

That is, (Y\x) ~ U(0, (1 + x 2 ) -1 ). Thus, the necessary condition (|3.8p is verified. 

The following result shows that collapsibility implies A-collapsibility plus homogeneity. If the 
background variable W is binary, then collapsibility equals A-collapsibility plus homogeneity. 

Theorem 3.3 Let (i) C^f, (ii) C w and (Hi) respectively denote the class of distribution 
functions with distribution dependence (i) homogeneous over W, (ii) collapsible over W and 
(Hi) A- collapsible over W. Then 

c w cc%ncf, 

and the equality holds when W is binary. 



4 A-Collapsibility of Quantile Regression Coefficients 

Assume, for brevity, the random variables Y, X and W are continuous with finite variances. 
Cochran (1938) proved the following relation for linear regression coefficients: 

Pyx — Pyx.w ~\~ Pyw.xPwxi (^ - -Q 
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where yx denotes the linear regression coefficient of Y on X, and (3 yx . w denotes corresponding 
coefficient of Y on X and W, and so forth. Equation (|4.ip decomposes the effect of a unit 
change in X on the response variable Y into two parts, the first being the effect with W fixed, 
and the second a product of two effects: The effect of a unit change in X on the moderating 
variable W, times the effect of a unit change in W on the response Y when X is fixed. Cox 
(2007) notes that (|4.ip is essentially the formula for the total derivative 

dy dy dy dw 
dx dx dw dx 

and hence could be extended to the more general setting of quantile regression coefficients, 
which we now describe. Given < 77 < 1, the function y„ = y>n{x) satisfying F(y v \x) = rj is 
called 77-th quantile function. The function 

. - -& F <*> (4 . 2) 



l(y\x) 

is called the quantile regression coefficient (see equation (2) of Cox (2007)). Note that 

= Qx(y v (x)\x) 

by implicit differentiation. Hence, the quantile regression function describes the effect of a 
unit change in X on quantiles of Y. Similarly, 

q x {y\x,w) = ° r— (4.3) 

f{y\x,w) 

represents the conditional quantile regression coefficient. Cox (2007) p. 757 established that 

q x {y\x) = E wly>x {5(y\x, W)}, (4.4) 

where S(y\x,w) = q x (y\x,w) + q w (y\x,w)q x (w\x) represents the total effect on quantiles of Y 
of a unit change in X, calculated at (x, w). When S(y\x, w) does not depend on w, Cox (2007) 
noted that 

Qx(y\x) = S(y\x,w), (4.5) 

a result similar to that of Cochran (1938). Our interest lies in the quantile regression coefficients 
q x {y\x) and q x (y\x,w). 

Definition 4.1 The quantile regression coefficient q x (y\x,w) is A-collapsible over W if 

q x {y\x) = E w \ y ^ x (q x (y\x, W)). (4.6) 

The next result shows that the conditions (i) and (ii) of Cox and Wermuth (2003) are sufficient 
for A-collapsibility. 



Theorem 4.1 The quantile regression coefficient q x (y\x,w) is A-collapsible over W if (i) 
Y _L W\X or (ii) W _L X. 

We next show, in general, that the converse of Theorem 4.1 is not true. Let S yx denote 
the support of (Y, X). Note from (IA.13D . A-collapsibility holds 

q w (y\x,w)q x (w\x)dF(w\y,x) = 0, V (y,x) G S yx (4.7) 



(q w (y\x,w)q x (w\x))f(y\x,w)———dw = 

f{y\x) 

F w (y\x,w)F x (w\x)dw = 0, V (y,x) G S yx . (4.8) 

Example 4.1 Let X > and W be real- valued continuous random variables with 

w 

F(w\x) = $(-), x > 0, weR, 
x 

so that 

F x (w\x) = -^4>(—), 

X z X 

where $ and cj) denote respectively the distribution and the density function of Z ~ A^O, 1). 
Also, let 

y + x — w 

F(y\x, w) = , w — x < y < w + x, 

2x 

so that Y, given X = x and W = w, follows uniform U(w — x, w + x), and 

F w (y\x, w) = w-x<y<w + x. 

2x 



Also, 



f°° 1 f°° w w 

/ F w (y\x,w)F x (w\x)dw = J —cj>(—)dw 

ty{t)dt 



oo 



1 

2x 

= 0, for all (y,x) G S yx . 

Then from f|4.8j) A-collapsibility holds, but neither condition (i) nor condition (ii) is satisfied. 

Finally, we identify a class of distributions for which condition (i) or condition (ii) is also 
necessary. 

Definition 4.2 The random variable W is said to be conditionally complete if{F(w\y, x)\(y, x) G 
S yx } is complete, that is, 

Ew\y,MW)} = => h(W) = a.e. V w ^ x . 



Theorem 4.2 Let {F(w\y,x)\(y,x) € S yx } be complete. Then condition (i) or (ii) of Theorem 
\4-l\ is also necessary. 

Observe that if the densities f(w\y,x) belong to exponential family, then {F(w\y, x)} is com- 
plete. 



Appendix A: Proofs 



Proof of Theorem 13.11 Let Y and X be continuous and W be discrete. First assume 
Condition (i) holds. Then 



/ '- ir v ' V dx 1 L 



W\X=x 



(dF{y\x)\ _ dF(y\ 



x 



\ dx J dx 
and hence A-collapsibility holds. Assume next Condition (ii) holds. Then 



dF{y\x) 



dx 



d_ 

dx 



^F{y\x,w)f w \ x {w\x) 

w 

^ fdF(y\x,w)\ 

w 

f dF(y\x,W) \ 
~ E W \ X=X ^ j, 

showing again that A-collapsibility holds. The proof for W continuous is similar. 
As to the converse, let 



E ^_ f dF(y\x,W) \ = dF(y\x) 



hold for all y and x. Then, 



\- f dF(y\x,w) \ , 
^ V dx J^ix^la;) 



d_ 

dx 



{^F(y\x,w)f w \x(w\x)} 



d 



fw\x(w\x)—F(y\x, w) 

w 

d 

+ Y,F{y\^^)—f w \ x {w\x). (A.l) 



Hence, 



d 



^ F(y\x, w) — f w \ x (w\x) = 0, for all x, y. 



(A.2) 
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Since w £ {1,2} is binary, we have 

0^fw\x(l\x) = fw\x(Mx) - fw\x(2\x) 
d 

fafw\x(2m = fw\x(2\x) ~ fw\x(Mx) 
and hence we get from f|A.2j) . 

{F(y\x, 1) - F(y\x, 2)}JLf w \ x (l\x) = 0, for all y and x. 
d 

Thus, we get F(y\x, 1) = F(y\x, 2) or — fw\xW x ) = 0; which are equivalent, respectively, to 

Y 1 W | X or X ±W. 

To see that ^fw\x0-\ x ) = implies X 1 W, note that if P(W = 2\X = x) = P(W = 1\X = 
x) for all x, then P(W = 2, X = x) = P(W = 1, X = x) = 0.5P(X = x) for all x, and X ± W 
follows easily. 

Proof of Theorem l3.21 We give the proof for the case of discrete W. Assume A-collapsibility 
holds. Then from ()A.2[) . 




(A.3) 



w 



Also 




(A.4) 



w 



Differentiating (|A.4j) with respect to x and using ()A.3[) . we get 




(A.5) 



w 



Differentiating ()A.5P now with respect to y, we get 




(A.6) 



Since F(y\x) has continuous mixed partial derivatives, we have 



dydx 



F(y\x) 



dxdy 



F(y\x) 




(e.g., see Apostal (1962) p. 214), and hence 




df(y\x,w) 
dx 
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Substituting the above facts in (|A.6|) . 

E MM„»| ll = *! ¥(lil)i (A . r) 

which proves the result. 

Proof of Theorem 13.31 Let C\ and C2 denote respectively the class of distribution functions 
that satisfy the conditions (i) and (ii) of Theorem 13.11 Then from Theorem 13.11 



CiUC 2 CCjf. (A. 

Suppose now F G C w so that 

dF(y\x, w) dF(y\x) 

— * = — z — ' v ly> x > w )- 

ox ox 

This implies 

( dF(y\x,W) \ _ f dF(y\x ) 

tj W\X=x o — &W\X=x I « — 



\ OX J \ ox 

dF(y\x) 



dx 



and hence A-collapsibility holds. Thus, C C . Also, by the definition of collapsibility, 
c w c c w and go 

C w C n Cjf. (A.9) 

By Remark O and Theorem 1 of Ma et al. (2006), if F € C% n [Ci U C 2 ], then F e C w , so 
that 

c% nCiuc 2 cc^. (A.10) 

Let now W be binary. Then by Theorem 13.11 C\ U C2 = Ca, and so 

cyn^cc^. (A.11) 

The result now follows from (jA"U|) and ([ATT]) . 
Proof of Theorem 14.11 From Cox's result (14.41) . 

fe (y|x) = E w]y>x (q x (y\x,W)) (A.12) 
£w|y,z(9iu(v|Zi W)q x {W\x)) = 

W.-)*H.)W-l»..)-0,*.-l <»,.). (A.13) 
If Condition (i) holds, then since 

Y J. W^jX <^=>- F(y|ar, «;) = ^(ylx) for all y, x and u>, (A. 14) 
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we have q w (y\x, w) = 0. Hence, (|A.12[) holds. 
If Condition (ii) W ±X holds, then, 

F(w\x) = F(w) for all (w,x) 

=> q x (w\x) = for all (w,x), 

which in turn proves (lA.12p . This proves the result. 

Proof of Theorem 14.21 Let A-collapsibility hold. Then from (|4.7p . 

J q w (y\x,w)q x (w\x)dF(w\y,x) = 0, for all (y,x) € S yx . 
The conditional completeness now implies 

q w (y\x,w)q x (w\x) = 0, for all (y,x) G S yx (A. 15) 

which is equivalent to 

q w (y\x,w) = 0, or q x (w\x) = 0. 
That is, condition (i) or (ii) holds. 
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